Specimen topography reconstruction

ABSTRACT

This method removes high frequency noise from shape data, significantly improves metrology system ( 10 ) performance and provides very compact representation of the shape. This model-based method for wafer shape reconstruction from data measured by a dimensional metrology system ( 10 ) is best accomplished using the set of Zernike polynomials (matrix L). The method is based on decomposition of the wafer shape over the complete set of the spatial function. A weighted least squares fit is used to provide the best linear estimates of the decomposition coefficients (Bnk). The method is operable with data that is not taken at regular data points and generates a reduced data field of Zernike coefficients compared to the large size of the original data field.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority of U.S. Provisional Patent ApplicationNo. 60/174,082 Entitled: SPECIMEN TOPOGRAPHY RECONSTRUCTION filed Dec.30, 1999, incorporated herein by reference.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

N/A

BACKGROUND OF THE INVENTION

Wafer shape is a geometric characteristic of a semiconductor wafer,which describes the position of the wafer's central plane surface inspace. The bow, warp and other shape related parameters of semiconductorwafers must be within precise tolerances in order for wafers to beusable. The precision of a dimensional metrology (measurement) systemmust be tight enough to provide the required control over the quality ofmanufactured wafers.

The high accuracy metrology of test specimens, such as the topographicmeasurement of bow, warp, flatness, thickness etc. of such objects assemiconductor wafers, magnetic disks and the like, is impeded by thepresence of noise in the output data. Depending on the inherentproperties of the instrument and the environment, the data may have anoise content that displays larger peak to peak magnitude than theactual dimensions being measured. It is difficult to remove all sourcesof wafer vibration in a sensor based dimensional metrology system whenthe wafer moves between the sensors. The natural frequency of wafervibration is of the order of tens to a few hundred Hertz, depending onwafer size and loading conditions, and the observed pattern of vibrationhas a spatial wavelength less than a few mm. If this noise is notremoved, it directly affects the repeatability and reproducibility ofthe measurements of the system.

The measurements for wafer shape are typically taken at a plurality ofpoints over the specimen surface. The positions of those points are notrigorously controlled between specimens. Therefore, the same data pointmay not be from the same exact location on each specimen tested by aparticular metrology unit. This limits the usefulness of such noiseelimination techniques as correlation analysis. Similarly the desire toprocess data for noise reduction from arbitrary shapes, particularlycircular, reduces the attractiveness of high speed data systems such asFast Fourier Transforms. Wafer shape is mostly a low spatial frequencycharacteristic. This makes it possible to remove vibration noise byusing a low pass 2D spatial filter.

Convolution-based filters require a regular, evenly spaced data set thatuses a priori information about the analytical continuation of the wafershape beyond the wafer boundaries, e.g. the periodic behavior of thewafer shape. Because of this requirement for regular data and a prioriinformation, conventional filters such as convolution techniques are notapplicable for wafer shape vibration-noise removal. Fast Fouriertransforms are an alternate high speed data processing method, but theyare not well adapted to noise reduction processing from arbitrarynon-rectilinear shapes, particularly circular shapes.

An analytical method for removing the noise content from metrologymeasurements of wafer specimens that accommodates the variability ofdata points is needed.

BRIEF SUMMARY OF THE INVENTION

This invention has application for wafer shape metrology systems wherethe wafer moves between two-dimensional sensors that scan it and thescan pattern is not necessarily evenly spaced in Cartesian co-ordinates.

The invention provides a method to reduce the noise in metrological datafrom a specimen's topography. The model-based method allows wafer shapereconstruction from data measured by a dimensional metrology system byquantifying the noise in the measurements. The method is based ondecomposition of the wafer shape over the full set of spatialmeasurements. A weighted least squares fit provides the best linearestimate of the decomposition coefficients for a particular piece oftest equipment. The fact that wafer's noise is predominantly a lowfrequency spatial object guarantees fast convergence. An importantadvantage of the use of the least squares fit method is the fact that aregular grid of data points is not required to calculate thecoefficients. Zernike polynomials are preferred for wafer shapereconstruction, as they operate with data that is not taken at regulardata points and that represents circular objects.

At least one set of raw data from a measurement is analyzed to obtain acharacterizing matrix of the Zernike type for that particularinstrument. A least squares fit on the single value decomposition of thedata is used to initially calculate the matrix characterizing theinstrument. Thereafter, this matrix does not need to be recalculatedunless factors change the errors in the measurement instrumentation.

Data characterizing the topography of a specimen, in the form of Zernikecoefficients, can be sent with specimens or telecommunicated anywhere.Because the Zernike coefficients are a complete characterization and areefficient in using minimum data space, this method significantlyimproves metrology system performance by removing high frequency noisefrom the shape data and providing a very compact representation of theshape.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING

These and other objects, aspects and advantages of the present inventionwill become clear as the invention becomes better understood byreferring to the following solely exemplary and non-limiting detaileddescription of the method thereof and to the drawings, wherein.

FIG. 1 shows apparatus for measuring the topography of a specimen, inparticular of a semiconductor wafer;

FIG. 2 shows a visual scale image of specimen topography with noise;

FIG. 3 shows a visual scale image of specimen topography characteristicof the measurement apparatus;

FIG. 4 shows a visual scale image of specimen topography with noisereduction; and

FIG. 5 shows a graph of tighter consistency of measurement after use ofthe invention.

DETAILED DESCRIPTION OF THE INVENTION

According to the present invention, and as shown in FIG. 1, a metrologysystem 10 receives a cassette 12 of semiconductor wafers 14 for testingof surface properties, such as those noted above. The wafers 14 aremeasured in a physical test apparatus 16, such as any of the ADECorporation's well-known measurement stations, the WAFERCHECK™ systemsbeing one such.

The physical test apparatus 16 outputs data to a processor 20 on acommunications line 18. The data is typically a vector of measured waferartifacts, such as flatness height, developed during a spiral scan ofthe wafer. The present invention operates to eliminate or reduce thenoise from the wafer measurement system.

The raw noisy data is typically stored in a memory area 22 where itsvector can be represented as W(ρ, φ), where ρ is the normalized(r/radius) radial location of each measurement point, and θ is the anglein polar coordinates of the measurement point. The processor 20 performsa transform on this data using a previously calculated matrix, L, whichrepresents the noise characteristic of the measurement station 10. Thistransform outputs the coefficients of a function that gives the noisereduced topography of the specimen at each desired point. The specimenshape is normalized for noise data alone. The outputs are fed to aninput/output interface 30 that may transmit the output to a remotelocation. The coefficients may also be transmitted from the I/O unit 30to remote locations, or sent along with the specimen on a data carrier,the Internet or any other form as desired.

The previously calculated matrix, L, is advantageously represented as aZernike polynomial. Zernike polynomials were introduced [F.Zernike,Physica, 1(1934), 689] and used to describe aberration and diffractionin the theoretical and applied optics. These 2D polynomials represent acomplete orthogonal set of functions over the unit circle. Anydifferentiable function defined over the finite radius circle can berepresented as a linear combination of Zernike polynomials. There is noneed for a priori information as is the case for convolution techniques.Zernike polynomials are invariant relative to rotation of the coordinatesystem around an axis normal to the wafer plane. This invariance aids inshape data analysis, especially for data having orientationdependencies. The spectrum of Zernike decomposition coefficients hasanalogues to power spectral density in Fourier space. The invariancecharacter is that it loses spatial significance as a Fourier seriesloses time relationships.

The transform from shape W(r,θ) onto Zernike functional space (n, k) isexpressed as:

$\begin{matrix}{{{W\left( {r,\theta} \right)} = {\sum\limits_{n,k}\;{B_{nk}{R_{n}^{k}(\rho)}{\exp\left( {{- {\mathbb{i}}}\; k\;\theta} \right)}}}},} & (1)\end{matrix}$

-   -   where, (r,θ) are data point polar coordinates,    -   ρ=r/wafer radius,    -   B_(nk) is the decomposition coefficient, and

$\begin{matrix}{R_{n}^{k} = {\sum\limits_{s = 0}^{{({n - k})}/2}\;{\left( {- 1} \right)^{s}{{\left( {n - s} \right)!}/{\left( {{s!}{\left( {{\left( {n + k} \right)/2} - s} \right)!}\left( {{\left( {n - k} \right)/2} - s} \right)} \right)!}}\rho^{({n - {2s}})}}}} & (2)\end{matrix}$

Where n and k and s are arbitrary variables of synthetic space.

The decomposition coefficients B_(nk) are calculated from the system oflinear equations (1). This system is over determined, in that the numberof equations (One for each data point) is two orders of magnitudegreater that the number of coefficients B_(nk) (unknowns).

The B_(nk) decomposition coefficients can be kept to a small number,typically around 100 by selection of the limits on n, and on k, whichvaries from −n to +n integrally. The data range typically is largeenough to accurately sample the noise being cancelled, while smallenough to be manageable. The spacial filtering is a result of the limiton the range for s, which is allowed to grow in the range 0 . . . n. Forwafer metrology, an n of about 10 filters out the noise componentdescribed above for the ADE Corporation equipment.

The system of equations (1) is solved using the weighted least squaresfit, because weighted least squares, fit overcomes measurement errors inthe input data. Weightings are determined based on the reliability ofdata; when data is more reliable (exhibits smaller variances), it isweighted more heavily. The calculated covariance matrix is used toassign weight to data points. Using the statistical weightings, improvesthe fit of the output.

According to Strang, [Strang, G., Introduction to Applied Mathematics,Wellesley-Cambridge, 1986, p. 398.] the best unbiased (withoutpreconditions) solution of the system (1) can be written asB=(A ^(T)Σ⁻¹ A)⁻¹ A ^(T)Σ⁻¹ W,  (3)

where,

-   -   B—vector of decomposition coefficients,    -   A—matrix of {{R_(n) ^(k)(ρ_(j))exp(−ikθ_(j))}, j=1,2, . . . ,        number of measured points.    -   T—stands for transpose matrix.    -   Σ⁻¹—inverse of the covariance matrix Σ.    -   W—vector of measured values W(ρ_(j),θ_(j)).

The matrix L=(A^(T)Σ⁻¹A)⁻¹A^(T)Σ⁻¹) in front of W in solution (3) doesnot depend on actual measured values. Therefore, for a given scanpattern it can be pre-calculated and stored in a computer memory. Matrixvalue L will need to be recalculated each time the error function of theinstrument changes. The matrix value L is calculated using the SingleValue Decomposition (SVD) method [Forsythe, G. E., Moler, C. B.,Computer Solution of Linear Algebraic Systems, Prentice-Hall, 1971]. SVDdoes not require evenly sampled data points.

Once L is determined, only one matrix multiplication is required tocalculated the unknowns in B. This procedure, when implemented, is asfast as a Fast Fourier Transform but avoids the 2D Fast FourierTransform's difficulties dealing with the wafers' circular boundariesand any non-Cartesian scan pattern.

The processor 20 of FIG. 1 can output either the Zernike coefficients ofthe actual wafer, or the output can be in the form of W(r,θ) that givesthe noise reduced topography of the specimen or wafer at any desiredpoint. W(r,θ) can be calculated from the Zernike coefficients.

The suggested method was first implemented and verified in a simulatedenvironment. ANSYS finite element analysis software was used to generatewafer vibration modes and natural frequencies for a number of waferdiameters and loading conditions. Then having the wafer shapemeasurement process affected by vibration was modeled and simulated in aMatlab. Generated shape data were processed according to the suggestedmethod yielding simulated shape and calibration information.

Later, shape reconstruction was applied to real world wafer shape dataacross an ADE platform to confirm the utility of the method. FIGS. 2–5illustrate the benefit of the present invention in removing noise fromthe scan of a specimen, shown in topographic presentation in FIG. 2. InFIG. 2, both the noise inherent in the measurement instrument and theirregularities of the wafer are integrated. The wafer appears to haveridges of high points 200 that radiate from the center of the wafer,some areas of nominal height 210, and diffuse regions of high spots 230.It would be difficult to plan a smoothing operation on the wafer shown.

In FIG. 3, the noise of the measurement instrument is presented. Here,it is evident that, from a nominal height center 300, arced radial bands310 extend to the circumference of the specimen 340. Some arcs 310 arecompact, while others 320 have a more diffuse aspect. This topographicchart illustrates how the instrument vibrates the specimen in theprocess of rotating it for scanning. Comparing the scales for FIGS. 2and 3, shows that the magnitude of the vibration noise is less than theoverall irregularity in the specimen. FIG. 4 shows the same specimen'stopography with noise of FIG. 3 removed. Now it can be seen that thespecimen has 3 high spots 400. Two of the high spots 400 exhibit a sharpgradient 410 between the nominal height of the specimen 430 and the highspot 400. The third high spot 400 exhibits a more gradual gradient 420between the nominal height 430 and the high spot. Further processing ofthis topography can be planned.

FIG. 5 illustrates the repeatability of the noise reduced data. For theten different measurement points, solid triangles 500, representingfiltered data, show a bow of between approximately 10 and 11 microns.The solid squares 510, representing noisy data, show a bow of betweenapproximately 12 and 9.5 microns.

The present invention operates to eliminate or reduce noise from noisydata measurements. While the description has exemplified its applicationto a wafer measurement system, it has application to other flatstructures such as memory disks.

Having described preferred embodiments of the invention it will nowbecome apparent to those of ordinary skill in the art that otherembodiments incorporating these concepts may be used. Accordingly, it issubmitted that the invention should not be limited by the describedembodiments but rather should only be limited by the spirit and scope ofthe appended claims.

1. A process for noise reduction from noisy data representing anartifact at sample points in two dimensional space of a wafer specimen,comprising the steps of: receiving said noisy data as a vector, eachelement of which corresponds to one sample point; and calculatingcoefficients of a polynomial which converts said noisy data vector to atwo dimensional function continuously representing the artifact in thetwo dimensional space, wherein said noisy data is obtained using a wafermeasurement apparatus, said noise being induced in said noisy data bymovement of said wafer specimen within said wafer measurement apparatus,and wherein said calculating step includes mathematically multiplyingsaid data vector by a matrix representing a noise characteristic of saidwafer measurement apparatus to achieve said noise reduction from saidnoisy data.
 2. The process of claim 1 wherein said sample points lackregular geometrically prescribed locations on said wafer specimen. 3.The process of claim 1 wherein said wafer specimen is a non-rectilinearspecimen.
 4. The process of claim 1 wherein the sample points have asufficiency to represent the spatial frequency of the noise to bereduced.
 5. The process of claim 1 wherein said polynomial is a Zernikepolynomial.
 6. The process of claim 1 wherein said calculatedcoefficients are fewer in number than the number of sample points. 7.The process of claim 1 wherein said calculating step includesmathematically multiplying said data vector by the matrix representingthe noise characteristic of said measuring apparatus, and wherein saidmatrix represents a least squares fit between said data vector and thepolynomial.
 8. The process of claim 7 wherein said matrix is a singlevalue decomposition of said two dimensional space as applied to saidapparatus.
 9. The process of claim 1 further comprising the step ofcalculating specimen spatial artifacts from said polynomial for one ormore points in said two dimensional space.
 10. The process of claim 9further comprising the step of transmitting said coefficients to aremote location prior to the calculation of spatial artifacts from saidpolynomial.
 11. A process for generating a noise correcting matrix for awafer measurement apparatus, comprising the steps of: receiving datarepresentative of artifacts in two dimensional space of a wafer specimenobtained by said wafer measurement apparatus, each data point associatedwith a data position, wherein movement of said wafer specimen withinsaid wafer measurement apparatus induces noise in said data; andcalculating a specimen-independent noise compensating matrix as afunction of said data position in two dimensional space on said waferspecimen, wherein said matrix represents a noise characteristic of saidwafer measurement apparatus, and wherein noise reduction in said data isachieved by mathematically multiplying said data by said matrix.
 12. Theprocess of claim 11 wherein said calculating step applies least squaresfit analysis.
 13. The process of claim 11 wherein said matrix is of theform of a multiplier of Zernike polynomial decomposition coefficients.14. An apparatus for noise reduction from noisy data representing anartifact at sample points in two dimensional space of a wafer specimen,comprising: means for receiving said noisy data as a vector, eachelement of which corresponds to one sample point, and means forcalculating coefficients of a polynomial which converts said noisy datavector to a two dimensional function continuously representing theartifact in the two dimensional space, wherein said noisy data isobtained using a wafer measurement apparatus, said noise being inducedin said noisy data by movement of said water specimen within said wafermeasurement apparatus, and wherein said calculating means includes meansfor mathematically multiplying said data vector by a matrix representinga noise characteristic of said wafer measurement apparatus to achievesaid noise reduction from said noisy data.
 15. The apparatus of claim 14wherein said wafer specimen is a non-rectilinear specimen.
 16. Theapparatus of claim 14 wherein the sample points have a sufficiency torepresent the spatial frequency of the noise to be reduced.
 17. Theapparatus of claim 14 wherein said polynomial is a Zernike polynomial.18. The apparatus of claim 14 wherein said calculated coefficients arefewer in number than the number of data points.
 19. The apparatus ofclaim 14 wherein said matrix represents a least squares fit between thedata vector and the polynomial.
 20. The apparatus of claim 19 whereinsaid matrix is a single value decomposition of said two dimensionalspace as applied to said measuring apparatus.
 21. The apparatus of claim14 further comprising means for calculating specimen spatial artifactsfrom said polynomial for one or more points in said two dimensionalspace.
 22. The apparatus of claim 21 further comprising means fortransmitting said coefficients to a remote location prior to thecalculation of spatial artifacts from said polynomial.
 23. Apparatus forgenerating a noise correcting matrix for a wafer measurement apparatus,comprising: means for receiving data representative of artifacts in twodimensional space of a wafer specimen obtained by said wafer measurementapparatus, each data point associated with a data position, whereinmovement of said wafer specimen within said wafer measurement apparatusinduces noise in said data; and means for calculating aspecimen-independent noise compensating matrix as a function of dataposition in two dimensional space on said wafer specimen, wherein saidmatrix represents a noise characteristic of said wafer measurementapparatus, and wherein said calculating means includes means formathematically multiplying said data by said matrix to achieve noisereduction in said data.
 24. The apparatus of claim 23 wherein saidcalculating means applies least squares if it analysis.
 25. Theapparatus of claim 23 wherein said matrix is of the form of a multiplierof a Zernike polynomial without decomposition coefficients.
 26. Theapparatus of claim 14 wherein said means for calculating coefficients isa computer.
 27. A model-based method of wafer shape reconstructioncomprising: obtaining a set of noisy data points representing the wafershape by a wafer measurement apparatus, wherein movement of said waferwithin said wafer measurement apparatus induces noise in said noisydata; using a complete set of Zernike polynomials as a shape functionalspace; applying a weighted least squares fit between said noisy datapoints and a set of data points calculated from said Zernikepolynomials, wherein said weighted least squares fit is represented by amatrix, and said matrix represents a noise characteristic of said wafermeasurement apparatus; and finding decomposition coefficients for saidwafer shape, wherein noise reduction is achieved in said noisy data bymathematically multiplying said set of noisy data points by said matrix.28. The model-based method of claim 27 wherein said decompositioncoefficients are a compact wafer shape data representation.
 29. Themodel-based method of claim 27 wherein said set of noisy data pointsform a scanning pattern that is not necessarily evenly spaced.
 30. Theapparats of claim 14, wherein said sample points lack regulargeometrically prescribed locations on said wafer specimen.
 31. Theprocess of claim 1 wherein the movement of said wafer specimen withinsaid wafer measurement apparatus comprises a circular rotation of saidwafer specimen.
 32. The process of claim 11 wherein the movement of saidwafer specimen within said wafer measurement apparatus comprises acircular rotation of said wafer specimen.
 33. The apparatus of claim 14wherein the movement of said wafer specimen within said wafermeasurement apparatus comprises a circular rotation of said waferspecimen.
 34. The apparatus of claim 23 wherein the movement of saidwafer specimen within said wafer measurement apparatus comprises acircular rotation of said wafer specimen.
 35. The method of claim 27wherein the movement of said wafer within said wafer measurementapparatus comprises a circular rotation of said wafer.